-
-
Notifications
You must be signed in to change notification settings - Fork 17.9k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Cythonized GroupBy Fill #19673
Cythonized GroupBy Fill #19673
Conversation
pandas/_libs/groupby_helper.pxi.in
Outdated
|
||
sorted_labels = np.argsort(labels) | ||
if method == 'bfill': | ||
sorted_labels[::-1].sort() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Why are the labels reversed before sorting?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
They only get reversed in the case of bfill
- it's the cheapest way to fill backwards that I could think of
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, but doesn't the .sort()
undo the reversing?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I took that from the below SO article and the comment by perimosocordiae was helpful in figuring out what's going on. Open to changing it to sorted_labels = np.sort(sorted_labels)[::-1]
for readability if you'd like
https://stackoverflow.com/questions/26984414/efficiently-sorting-a-numpy-array-in-descending-order
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Oh, thanks! Perhaps too clever, how about this? still avoids the copy
sorted_labels.sort()
sorted_labels = sorted_labels[::-1]
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Actually I think you are on to something - the extra sort
doesn't feel right and I think it could cause a bug as it will really just return an array ranging from N-1..0. Going to work up some test cases to validate and add fix to next commit
pandas/_libs/groupby_helper.pxi.in
Outdated
out[idx, 0] == {{nan_val}} | ||
filled_vals += 1 | ||
else: # reset items when not missing | ||
filled_vals = 0 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Won't filled_values
need to be tracked by group? Consider this case (haven't tested on your PR):
df = pd.DataFrame({
'k': [1, 1, 2, 2, 1, 1], 'v': [1.1, np.nan, 1, np.nan, np.nan, np.nan]})
df.groupby('k').ffill(limit=2)
Out[25]:
k v
0 1 1.1
1 1 1.1
2 2 1.0
3 2 1.0
4 1 1.1
5 1 NaN
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
It needs to be tracked by both group and value. One thing to keep in mind is that the main loop in the Cython function doesn't go sequentially over the values in the series / frame, but rather iterates over the argsorted
ed labels, which allows you to keep track of groups easily. Just ran your example on my PR and it gave the result you have above.
That said, I don't have a problem adding a test case that mixes the groups up to prevent this from breaking in the future. Can bundle that in with the next commit
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test case for this is now a little more complicated, handling both sequential groups (ex: ['a', 'a', 'a', 'b', 'b', 'b']) and "interwoven" (ex: ['a', 'b', 'a', 'b', 'a', 'b']). Open to splitting it up into separate tests if you think that would make it more readable.
Otherwise this instance should be covered. I fixed a bug in the Cython code to go along with it - thanks for the callout!
pandas/core/groupby.py
Outdated
@@ -1472,7 +1479,7 @@ def pad(self, limit=None): | |||
Series.fillna | |||
DataFrame.fillna | |||
""" | |||
return self.apply(lambda x: x.ffill(limit=limit)) | |||
return self.apply('ffill', limit=limit) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Rather than calling apply
, pattern so far has been to define the method directly Groupby.cumsum
, etc
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Tied to the comment below, I added _cython_apply
as its own function to handle dispatching back to the Cython layer, which this apply
tries first before falling back to the current method of iterating over the groups and applying the method directly from there. Items like Groupby.cumsum
call _cython_transform
in a similar fashion, which I was trying to emulate here but I was hesitant to send these methods down the transform route because of the complexity that would cause there
pandas/core/groupby.py
Outdated
@@ -2032,6 +2039,38 @@ def _get_group_keys(self): | |||
self.levels, | |||
self.labels) | |||
|
|||
def _cython_apply(self, ftype, data, axis, **kwargs): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think all this should pass down to _cython_transform
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I was a little torn over this. The call signature doesn't play all that nicely with the other methods utilizing _cython_transform
. Specifically, the is_numeric
and is_datetimelike
arguments aren't applicable and there are some conditionals that need to be added to prevent accidental casting of data during the fill. On top of that, we'd have to wrap the transformation to be sure to include the grouped columns as none of the other transformations do that.
Touched on this with @jreback in the comments of #19481 - there for sure needs to be a pretty comprehensive refactoring on groupby.py
to more effectively dispatch Cython operations. For the time being I thought this approach would be the "best of the worst" option, but can take another look at wiring into transform if we feel that is a show stopper
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't disagree the whole thing needs some refactoring, xref also to #19354 - I'll think about it a bit more too
5e007f8
to
656cf77
Compare
One possibility to clean this up would be to replicate Line 1846 in 07137a5
Rather than type specific functions, there the cython routine computes a single indexer, then the normal pandas/pandas/_libs/groupby_helper.pxi.in Line 991 in 07137a5
Probably a bit slower than what you do here, but would be surprised if it's bad. |
Thanks for the idea. I think I could make something like that work here and that would definitely simplify things |
656cf77
to
4ba5994
Compare
Just simplified the code to mirror what we do for before after ratio
[07137a5a] [0939aeae]
+ 983ms 1.23s 1.25 groupby.GroupByMethods.time_method('float', 'pct_change')
+ 356±4μs 443±20μs 1.24 groupby.GroupByMethods.time_method('float', 'prod')
+ 84.6±0.2μs 101±3μs 1.19 groupby.GroupByMethods.time_method('int', 'count')
+ 185±1μs 218±3μs 1.18 groupby.GroupByMethods.time_method('int', 'cumcount')
+ 161±3μs 185±3μs 1.15 groupby.GroupByMethods.time_method('float', 'cumcount')
+ 334±9μs 382±20μs 1.14 groupby.GroupByMethods.time_method('float', 'median')
+ 461±3μs 523±10μs 1.13 groupby.GroupByMethods.time_method('int', 'cummax')
+ 344±0.8μs 387±10μs 1.12 groupby.GroupByMethods.time_method('float', 'nunique')
+ 64.5±0.2μs 72.1±0.9μs 1.12 groupby.GroupByMethods.time_method('int', 'size')
+ 456±2μs 504±20μs 1.11 groupby.GroupByMethods.time_method('float', 'cummin')
+ 133±5ms 146±3ms 1.10 groupby.GroupByMethods.time_method('int', 'any')
- 769±20μs 684±30μs 0.89 groupby.GroupByMethods.time_method('int', 'cumprod')
- 893±100μs 598±30μs 0.67 groupby.GroupByMethods.time_method('int', 'sem')
- 145±2ms 275±7μs 0.00 groupby.GroupByMethods.time_method('int', 'bfill')
- 154±3ms 245±1μs 0.00 groupby.GroupByMethods.time_method('int', 'ffill')
- 228±5ms 270±10μs 0.00 groupby.GroupByMethods.time_method('float', 'bfill')
- 237±5ms 232±0.5μs 0.00 groupby.GroupByMethods.time_method('float', 'ffill') |
5eebf6e
to
60a17bb
Compare
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
looks pretty good
pandas/_libs/groupby_helper.pxi.in
Outdated
|
||
N = len(out) | ||
|
||
sorted_labels = np.argsort(labels).view(dtype=np.int64) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
copy
as in #19701
pandas/core/groupby.py
Outdated
if limit is None: | ||
limit = -1 | ||
output = {} | ||
if type(self) is DataFrameGroupBy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
What's this if branch for?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Fill is unique in that unlike say transformations the column(s) used in the grouping are also returned as part of the output of a DataFrame object. _iterate_slices
does not go over those items and I didn't see an existing method to handle that, so this conditional makes sure the grouped columns still make their way into the frame that gets returned
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
FWIW I was thinking to myself if its worth adding an instance method to the GroupBy object to handle this (_iterate_groups
or something to the effect) but couldn't think of any other function that requires this feature
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than doing the if type(...)
, make a method in SeriesGroupBy and DataFrameGroupBy
60a17bb
to
e4bceba
Compare
Codecov Report
@@ Coverage Diff @@
## master #19673 +/- ##
=========================================
Coverage ? 91.65%
=========================================
Files ? 150
Lines ? 48962
Branches ? 0
=========================================
Hits ? 44875
Misses ? 4087
Partials ? 0
Continue to review full report at Codecov.
|
['all', 'any', 'bfill', 'count', 'cumcount', 'cummax', 'cummin', | ||
'cumprod', 'cumsum', 'describe', 'ffill', 'first', 'head', | ||
'last', 'mad', 'max', 'min', 'median', 'mean', 'nunique', | ||
'pct_change', 'prod', 'rank', 'sem', 'shift', 'size', 'skew', |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we bench for the compat methods for datetimelikes? (timestamp / timedelta), most of these would work (except prod, mean, mad, pct_change and a few more), though these likely work for timedeltas. ok with adding an issue to do this as well (IOW doesn't need to be in this PR).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
See #19733
doc/source/whatsnew/v0.23.0.txt
Outdated
@@ -646,6 +646,7 @@ Performance Improvements | |||
- Improved performance of pairwise ``.rolling()`` and ``.expanding()`` with ``.cov()`` and ``.corr()`` operations (:issue:`17917`) | |||
- Improved performance of :func:`DataFrameGroupBy.rank` (:issue:`15779`) | |||
- Improved performance of variable ``.rolling()`` on ``.min()`` and ``.max()`` (:issue:`19521`) | |||
- Improved performance of :func:`GroupBy.ffill` and :func:`GroupBy.bfill` (:issue:`11296`) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I am not sure this will render. you can just do .groupby().ffill()
and so on
pandas/_libs/groupby_helper.pxi.in
Outdated
with nogil: | ||
for i in range(N): | ||
idx = sorted_labels[i] | ||
if mask[idx] == 1: # is missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curr_fill_idx seems potentially not defined (if it is missing but the limit is not hit), not sure if that is possible. maybe could initialize curr_fill_idx = -1?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The initialization occurs at declaration. Can move that into the body of the function if you feel that is more readable
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh see that ok then
pandas/_libs/groupby_helper.pxi.in
Outdated
int64_t curr_fill_idx=-1 | ||
int64_t idx, filled_vals=0 | ||
|
||
N = len(out) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe want an assert that N == len(labels) == len(mask)
pandas/core/groupby.py
Outdated
if limit is None: | ||
limit = -1 | ||
output = {} | ||
if type(self) is DataFrameGroupBy: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
rather than doing the if type(...)
, make a method in SeriesGroupBy and DataFrameGroupBy
pandas/core/groupby.py
Outdated
for name, obj in self._iterate_slices(): | ||
indexer = np.zeros_like(labels) | ||
mask = isnull(obj.values).view(np.uint8) | ||
libgroupby.group_fillna_indexer(indexer, mask, labels, how, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe make a routine that shares code with how the shift_indexer is called (I am talking about in python space here)
pandas/_libs/groupby_helper.pxi.in
Outdated
with nogil: | ||
for i in range(N): | ||
idx = sorted_labels[i] | ||
if mask[idx] == 1: # is missing |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ahh see that ok then
pandas/_libs/groupby_helper.pxi.in
Outdated
curr_fill_idx = idx | ||
|
||
out[idx] = curr_fill_idx | ||
# If we move to the next group, reset |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
blank line here
e4bceba
to
aba1467
Compare
Latest commit includes a shared function between Open to review and plan to change the location of the func within the module / update docs, but I wanted to push because this commit will cause tests to fail at the following location: pandas/pandas/tests/groupby/test_transform.py Line 555 in 718d067
The reason this is failing is because I replaced I believe the test case is wrong anyway and the failing line should either have a |
its prob happenstance that this worked before, IOW it happened to be sorted already. go ahead and add what is needed to make it pass (and FYI make this an |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
any reason you want this to be a separate module at this point? it looks ok here
pandas/core/groupby.py
Outdated
------- | ||
GroupBy object populated with appropriate result(s) | ||
""" | ||
exp_kwds = collections.OrderedDict([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
these should be passed in directly (from the calling function), no? rather than hard coded inside this function.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, maybe the caller passes in a partial the kwargs baked rather than the string version of the funtion?
_get_cythonized_result(
partial(pd._libs.groupby.group_shift_indexer, nperiods=2),
...
)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I’ll think more about that. The one complicating factor is that masked values cannot be passed in directly as they are calculated on a per slice basis. To avoid a convoluted set of calls I figured it would make sense to have all keywords and positional arguments resolved within one function, but there’s certainly other ways to go about this
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chris-b1 comment is about the exp_kwds, I am still not clear why the caller cannot pass these
pandas/core/groupby.py
Outdated
if needs_ngroups: | ||
func = partial(func, ngroups) | ||
|
||
# Convert any keywords into positional arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
you can pass kwargs to cython functions, so not sure this is necessary
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I explicitly converted kwargs into positional arguments from the first comment back in #19481. If we are OK with kwargs in the Cython layer I can rewrite this, as it could simplify a few things
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you can pass kwargs
pandas/core/groupby.py
Outdated
base_func = getattr(libgroupby, how) | ||
|
||
for name, obj in self._iterate_slices(): | ||
indexer = np.zeros_like(labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
do we need to specify int64? (or platform int)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, int64
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This was a copy/paste of existing code, so I didn't give it too much thought. That said, labels
is already wrapped with a _ensure_int64
call as part of group_info
within the BaseGrouper
, so unless there's something in particular you know of I figure it would be easiest to inherit that type from labels and not override
pandas/core/groupby.py
Outdated
def _fill(self, direction, limit=None): | ||
"""Overriden method to concat grouped columns in output""" | ||
res = super()._fill(direction, limit=limit) | ||
output = collections.OrderedDict() |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
would be slightly simpler as a list-comprehension
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
though maybe this should be an arg to _fill? (IOW the option to do it should be done in fill)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do you mean as an arg to _get_cythonized_result
(fill would always use, unless we wanted to implement a new feature)? I was thinking about that but given fill
on a DataFrameGroupBy
is the only operation I know of at the moment that includes the grouping in the body of the returned object I figured it made the most sense to just perform that action there
pandas/core/groupby.py
Outdated
base_func = getattr(libgroupby, how) | ||
|
||
for name, obj in self._iterate_slices(): | ||
indexer = np.zeros_like(labels) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, int64
pandas/core/groupby.py
Outdated
------- | ||
GroupBy object populated with appropriate result(s) | ||
""" | ||
exp_kwds = collections.OrderedDict([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Right, maybe the caller passes in a partial the kwargs baked rather than the string version of the funtion?
_get_cythonized_result(
partial(pd._libs.groupby.group_shift_indexer, nperiods=2),
...
)
pandas/core/groupby.py
Outdated
indexer = np.zeros_like(labels) | ||
func = partial(base_func, indexer, labels) | ||
if needs_mask: | ||
mask = isnull(obj.values).astype(np.uint8, copy=False) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Contrary to the int64 case, you do want to use view
here - numpy doesn't see bool and uint8 as the 'same' type, so will always copy, which view elides.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks for the heads up - still trying to wrap my head around some of the nuances there. Out of curiosity, how do you know this? Just from experience or is there a good reference?
I do see that astype
documentation mentions dtype, order and subok
requirements need to be satisfied to prevent copy, but I'm not clear on what those requirements are and if they are documented
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Unfortunately all I can really point to is trial and error. Some "general" rules, at least in a pandas context
copy=False
will only elide a copy if the dtype is identical (may be contradictions to this, but at least not in general). Useful for conditional casts.arr.view(dtype)
never copies. Viewing across types should only be used if if the itemsize of the two types are identical. Primarily useful for casting a type with some metadata down to a a more primitive type (bool to uint8, datetime64 to int64).
To clarify my point about having a separate module for |
@WillAyd ahh perfect - yes absolutely in favor of s class dispatch mechanism |
Hello @WillAyd! Thanks for updating the PR. Cheers ! There are no PEP8 issues in this Pull Request. 🍻 Comment last updated on February 24, 2018 at 16:41 Hours UTC |
efac226
to
8af12e3
Compare
pandas/core/groupby.py
Outdated
@@ -1457,6 +1457,15 @@ def expanding(self, *args, **kwargs): | |||
from pandas.core.window import ExpandingGroupby | |||
return ExpandingGroupby(self, *args, **kwargs) | |||
|
|||
def _fill(self, direction, limit=None): | |||
# Need int value for Cython |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
can you add a doc-string
pandas/core/groupby.py
Outdated
------- | ||
GroupBy object populated with appropriate result(s) | ||
""" | ||
exp_kwds = collections.OrderedDict([ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@chris-b1 comment is about the exp_kwds, I am still not clear why the caller cannot pass these
pandas/core/groupby.py
Outdated
if needs_ngroups: | ||
func = partial(func, ngroups) | ||
|
||
# Convert any keywords into positional arguments |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
yes, you can pass kwargs
8bf59dc
to
2d78efc
Compare
pandas/_libs/groupby_helper.pxi.in
Outdated
int64_t lab, idxer, idxer_slot | ||
int64_t[:] label_seen = np.zeros(ngroups, dtype=np.int64) | ||
int64_t[:, :] label_indexer | ||
|
||
periods = kwargs['periods'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
hmm, if you are not doing a .get()
here, is there a reason you are not simply haveing periods=None
in the signature? (if its optional)?
pandas/_libs/groupby_helper.pxi.in
Outdated
ndarray[int64_t] sorted_labels | ||
int64_t limit, idx, curr_fill_idx=-1, filled_vals=0 | ||
|
||
direction = kwargs['direction'] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
same again why not simply list them in the signature? (you can have **kwargs if you want to ignore other ones)
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I didn't want to define a default value for something like periods
in both the Python and Cython layers, leaving the former to set that default and be responsible for passing to Cython.
It's a pretty trivial change so will re-push with your changes reflected soon
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Think I've been looking at this change too long...planning on making these required in the Cython signature and using the existing defaults in the Python layer to populate those via **kwargs
. If we are not on the same page let me know
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no that sounds good
2d78efc
to
42dceac
Compare
422c0f3
to
eff6603
Compare
Travis failure on latest commit looks unrelated (something with pyarrow?) |
thanks @WillAyd very nice as always! |
git diff upstream/master -u -- "*.py" | flake8 --diff
I am not a fan of how I've implemented this in
groupby.py
but I think this change highlights even more the need for some refactoring of that module, specifically in how Cython transformations are getting dispatched. This still works in the meantime but open to any feedback on how the methods are getting wired back to the Cython layer.Below are ASVs for the change